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Thb work examines the potcniial con- 
nections between extreme value statis- 
tics, problems in aerosol science, and a 
recent technique of solving ill-posed in- 
version problems, called EVE (Extreme 
Value Esiimation). EVE estimates 
functional of the unknown solution by 
searching the extreme (maximum and 
minimum) values of that functional 
within a set of acceptable solutions. 
The statistics of occurrence of extreme 
values in real life were not considered 
when this method was developed. The 
results of this technique are more con 
servalive than those of the other meth- 
ods used to solve the prrtilem of 
aerosol size distribution estimation like 
non-linear least squares, expectation- 
maximization, regularizatjon, etc. The 
utilization of the customary methods of 
deconvolution may lead to an underes- 
timation of the possibility of occurrence 



of extreme values in real life. It is sug- 
gested that consideration Of extreme value 
statistics might aid In better defining 
the limits to be placed on the physically 
acceptable solutions in the EVE decon- 
volution. Other problems could also 
benefit from the application of extreme 
value statistics including the estimation 
of the second highest value of mea- 
sured airborne particle mass in the con- 
text of the ambient air quality standard 
for particulate matter less than 10 ^.m 
and (he determination uf the Maximally 
Exposed Individual as required under 
the 1990 revisions to the Clean Air Act. 
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1. Introduction 

Although extreme value statistics has been ap- 
plied to environmental phenomena such as maxi- 
mum wind speed and wave heights, it has not been 
applied to air pollution regulations, concentration 
estimation, or other related problems. Since many 
of the problems related to the effects of pollution 
on public health and welfare are dependent on the 
high end of the distribution of concentrations and/ 
or exposures, there appears to be an opportunity to 
bring the developments in extreme value statistics 
to an area that could make good use of such meth- 
ods. In this paper, three possible applications of 
extreme value statistics will be presented with the 
hope of sparking interest in bringing these tools to 
bear on some difficult but interesting problems. 



2. Aerosol Size Distribution Estimation 

One common problem in aerosol science is the 
estimation of the aerosol particles size distribution 
from measurements of their aerodynamic behavior 
(penetration or deposition) through a separation 
device. For small particles (<3(K) nm), the pene- 
tration through a device is governed by the parti- 
cle's dtffusivity while for large particles (>3(K) nm), 
inertial impaction is the usual separation mecha- 
nism. The response of the device is known either by 
calculation or measurement using particles of 
known size. For the unknown aerosol, the penetra- 
tion is measured through a series of stages that 
sequentially remove additional particles. From the 
known characteristics and a limited number of mea- 
surements, the size distribution of the aerosol is 
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estimated. In general there are fewer measure- 
ments than parameters to be estimated and there 
can be collinearity problems in the penetration ma- 
trix describing the instrument to complicate the 
problem further. There are a number of conven- 
tional approaches to providing a solution, but since 
the problem is underdetermined, one cannot insure 
that they will provide the true solution. It is also dif- 
ficult to estimate error bounds for these solutions. 

2.1 Conventional Methods 

The observed sequence of particle concentrations 
penetrating through each stage of a size segregating 
device contains information on the size distribution 
of that aerosol. In general, the number of particles 
penetrating through a given stage of the system can 
be expressed by 



Ni^Noj P(r. dp)/(dp)d^p 



+ e, 



(1) 



where Ni is the concentration penetrating through 
the / th stage, Pii.dp) is the known particle size pen- 
etration characteristics for particles of diameter dp 
through stage i, f(dp) is the size distribution func- 
tion to be estimated, and e, is the error in fitting the 
measurement. 

The normal approach to solving this equation is 
to express it as a series of linear, simultaneous 
equations relating the particle penetration fraction 
to discrete values of the size distribution and the 
stage penetration functions. 



i-i 



i = l. 



(2) 



where / is the number of stages in the device,/ is the 
number of size interval midpoints in the distribu- 
tion, Pij is the penetration of the/th particle size, 
dfii), through theith stage, and Nj is the number of 
particles penetrating the ith stage. The ft values 
must be nonnegative. However, there is generally 
no other objective a priori information on the nature 
of the distributions. The size distribution is not nor- 
malized so that 



^0=1 f, 



(3) 



where No is the total airborne concentration that is 
being partitioned into the various size intervals. 
Equation 2 can be rewritten in matrix form. 



N^P-f+E. 



(4) 



If/ is greater than or equal to7, then the problem 
is overdetermined and can be solved for a unique 
solution using methods such as least squares. How- 
ever, because the size distributions typically cover 
several orders of magnitude in particle diameter, it 
is normally necessary to estimate more midpoint 
values than measurements (/ </). There is then no 
unique solution to the problem. 

Because collection by diffusion varies slowly with 
particle size, the penetration values for adjacent 
size ranges are often quite similar to one another. 
The penetration functions for a screen diffusion 
battery used for separating particles in the 0.5 nm to 
500 nm range generally have substantial collinearity 
and thus, the problem is ill-conditioned as well as 
underdetermined [1]. Phillips [2] concluded that di- 
rect inversion of theses equations rarely produces 
physically acceptable solutions. 

Two techniques for solving the ill-posed set of 
equations have been developed by Twomey [3] and 
by Maher and Laird [4]. There is limited theoretical 
justification for these methods. In practice, how- 
ever, they have been widely used in the aero.sol field 
with satisfactory results in many cases. Different 
variations of the Twomey algorithm have been pro- 
posed (e.g., [5]). 

Other approaches have sought specific solutions 
within the feasible solution space by incorporating 
additional constraints on the problem. For example, 
Wolfenbarger and Seinfeld [6] assume that the dis- 
tribution is fully smooth from one interval lo an- 
other. However, it is certainly possible to have 
aerosol sources that produce particles with a very 
narrow initial distribution and thus, the overall 
aerosol size distribution may not be truly smooth. 
Thus, in all of these solution methods, a solution, 
but not necessarily the solution will be obtained. 

2.2 Extreme Value Estimation 

Replogle et al. [7] initially suggested the concept 
that the primary "solution" is the set of all those 
points that could produce the observed values. 
Paatero [8,9] recognized that this approach could 
be applied to the aerosol inversion problem by 
considering a one-to-many mapping of the mea- 
sured N onto /such that there is the set D{N) of 
possible solutions corresponding to each possible 
measured N. The set D(N) is defined as the collec- 
tion of all such solutions /that allow the reproduc- 
tion of the measured N by Eq. (4) when reasonable 
values are used for E. Then the true unknown solu- 
tion f is a member of the set D{N) with a high prob- 
ability. D{N) is then the set of acceptable solutions. 
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To initiate the analysis a best fit, fa, is calculated 
such that the nonnegative constraints are satisfied. 
Additional solutions are calculated that are suffi- 
ciently close to the best fit estimation that they fall 
within a criterion for acceptable solutions. For each 
of the estimated quantities, the largest and smallest 
values within the set D(A^) are taken as the bounds 
of the confidence interval in which the true solution 
will fall at some high probability. 

The question is then how to define what solutions 
are acceptable. The likelihood function, L(N,f) is 
the probability of observing A^ when f is given. Itwill 
be assumed that 



■ln{L)= const • 2 
1=1 



-^consfQ(f) , (5) 



so that Qif) is the sum-of-squares for the case in 
which /and A' are substituted into Eq. (1). The op- 
timum solution would then be the one that maxi- 
mizes L or minimizes Q, The minimum Q value is 
denoted Qo corresponding to /o. Maintaining the 
non-negativity constraints, the members of accept- 
able solution set, D, must be such that 



\n[L(f)]>]n[L(fo)] -const ■ K 



or alternatively. 



Q(f)^Qo^K, 



(6) 



(7) 



where ^ is a confidence parameter with a typical 
value of 3. In this way, the set of acceptable solu- 
tions of the original equation that fit sufficiently 
well are determined. In estimating the effects of ex- 
posure to this airborne activity, it may be of interest 
to estimate a function of the distribution. The dose 
to cells in the bronchial epithelium could be calcu- 
lated by 

g[f{d,)-] = \ G{d,)f{d,)6d, , (8) 

where G{df) is the dose per unit airborne alpha ac- 
tivity in the size range dp to dp + ddj, [10]. To exam- 
ine the original distribution, the cumulative sums 
are estimated as represented by the following se- 
quence of functionals: 






Oifdp>d 



(9) 



where the F{d) is the cumulative size distribution 
for the aerosol. The EVE(P) approach estimates 
such functionals by determining their confidence in- 
tervals. 



2.3 Activity-Weighted Size Distributions 

Activity-weighted size distribution have been 
measured in a number of normally occupied houses 
[11-13] using an automated, semi-continuous 
graded screen array (ASC-GSA) described by Ra- 
mamurthi [14] and Ramamurthi and Hopke [15]. 
The ASC-GSA measurement system is a diffusion 
battery that uses a combination of six sampler-de- 
tector units operated in parallel. Each sampler-de- 
tector unit couples wire screen penetration, filter 
collection, and activity detection with a solid state 
detector in a way as to minimize depositional losses. 
The system samples air simultaneously in all of the 
units, with a flow of about 15 Ipm through the sam- 
pler slit between the detector and filter holder sec- 
tion in each unit. The sampled air is drawn through 
a filter. Complete details of the sampler are pro- 
vided by Ramamurthi and Hopke [15]. 

Computer control of sampling, counting, and 
analysis permits automated, semi-continuous oper- 
ation of the system with sampling every 1.5 h to 3 h. 
The activities of each radon progeny are estimated 
from alpha spectra collected during two counting 
intervals; the first one during sampling and the sec- 
ond 20 min after end of sampling. The observed 
concentrations of "'*Po, -''Pb, and '"Bi are used to 
reconstruct the corresponding activity-weighted 
size distributions using the Expectation-Maximiza- 
tion algorithms [4] in six inferred size intervals in 
geometric progression within the 0.5 nm-500 nm 
size range. In addition to the individual size distri- 
bution for each decay product, the total airborne 
activity concentration can be characterized by the 
Potential Alpha Energy Concentration (PAEC). 
The PAEC can be calculated from the individual 
progeny concentrations by 



PAEC (mJm)-^ = 5.19 X 10-'' • c, 



+ 2.86x10-* • C2-I-2.10X 10"* • Ci , 



(10) 



where Ci, c;, and a are the activity concentrations of 
the three radon decay products in Bq m~^. 



2,4 Results 

Measurements have been made in a number of 
houses in Northeastern North America. To illus- 
trate the use of the EVE(P) algorithm for deconvo- 
luting the activity size distributions, samples taken 
in houses in Arnprior, Ontario and Parishville, NY 
will be presented. In each home, radon and the size 
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distributions of each of the three decay products 
and the PAEC were determined at 2 h intervals. 
The details of the experiments in Arnprior are 
given by Hopke et al. [11]. In this home, radon con- 
centrations were relatively low (< 100 Bq m~') and 
generally in the range of 25 Bq m"' to 45 Bq m"'. 
The cumulative probability distribution for PAEC is 
shown in Fig. 1. The outer boundary lines are the 
EVE(P) results for the 95% and 99% confidence in- 
tervals. The solid central line is the EM deconvolu- 
tion result. Although the specific solution obtained 
by the EM algorithm should fall within the EVE 
bounds, it may lie anywhere within the feasible re- 
gion. The confidence band will not necessarily be 
symmetrically distributed about the specific solu- 
tion obtained by any particular algorithm. 
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Fig. 1. Cumulative dislribution for PAEC for a sample taken 
in an cxcupied home in Arnprior, Ontario. 



Another analysis was performed on samples from 
a home in Parishville, NY with much higher radon 
concentrations (500 Bq m~^ to 600 Bq m"^) and 
thus, the bounds on the feasible region might be 
smaller [16]. The comparison of the EM size distri- 
bution with the EVE(P) distribution for PAEC is 
shown in Fig. 2. The EM-derived distribution does 
not appear to fully fit within the EVE(P) bounds. 
The question is then whether the current EVE(P) 
approach is the best description of the bounds on 
the feasible region. 

Consideration of extreme value statistics could 
lead to the following suggestion: it might be possible, 
to define some statistical properties for the extreme 
members of the set of acceptable solutions, even 
when there exists no general information about the 



probability distribution of the solution. Such prop- 
erties might help in better defining the limits of the 
set of acceptable solutions. This could help in re- 
ducing the confidence intervals of the EVE decon- 
volution technique without sacrificing the reliability 
of estimation. 
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Fig. 2. Cumulative distribution for PAEC for a sample taken in 
an occupied tiome in Parishville, NY. 



3. Other Applications 

3.1 Ambient Air Quality Standard for PMig 

In 1987, the U.S. Environmental Protection 
Agency promulgated a new National Ambient Air 
Quality Standard (NAAQS) for airborne particu- 
late matter [17] which defined a size-selected por- 
tion of the ambient aerosol, particulate matter less 
than 10 p.m or PMio, as important for protection hu- 
man health and a new way of the determining when 
the standard had been violated. It is the form of the 
24 h standard that involves extreme values. The 
standard requires that samples taken over 24 h in- 
tervals not show more than 1 "expected ex- 
ceedance" of 150 jig m"^ per year averaged over a 
3 year period. Particle samples are not usually taken 
daily because of the manpower requirements 
needed to manually weigh unexposed filters, change 
them in the field, and weigh the exposed filters 
again. A minimum sampling regime would collect 
samples every 6th day. Thus, over a year approxi- 
mately 61 samples might be collected. It is assumed 
that these samples are IID and thus, the number of 
"expected exceedances" can be estimated as 



mi 



(11) 
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where for a given year i, EEi is the number of esti- 
mated exceedances, OEi is the number of observed 
exceedances, ntj is the number of samples taken, 
and fli is the number of days in the year. Thus, if 61 
samples are talcen in a 365 day year, then 1 observed 
exceedance becomes 6 expected exceedances. If 
this observed exceedance is the only one that occurs 
during a 3 year interval, then the 6 expected ex- 
ceedances are divided by 3 years to yield an average 
number of expected exceedances of 2 which is 
greater than 1 and hence the area is in non-attain- 
ment of the standard. In other words, the average 
number of expected exceedances in any 3 year pe- 
riod is gh^en by 






(12) 



Davidson and Hopke [18] examined some of the 
problems that arise as a result of the application of 
such a standard given incomplete sampling. To il- 
lustrate the difficulties, the upper tail of the distri- 
bution of airborne mass concentrations will be 
represented by the following exponential distribu- 
tion: 

F(c^L) =1-4f =l-exp(^y + 2.0) (13) 
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or 



/>(c>L) = exp (2.0 7.90^), (14) 

where c is the mass concentration of airborne par- 
ticulate matter and L is the maximum concentration 
allowable under the standard. The probability of an 
average number of exceedances being greater than 
1 will be examined by examining P(E > 1.05), 

P(E>1 .05) =P(X EEi/3 > 1 .05) 

=P(t EEi 2 3.15) 

= l-P(I,EEi < 3.15) 

= \-P(E.OEi < 3.15 -/i/m) 

(15) 

Thus, the probability of nonattainment classifica- 
tion is dependent on the number of measurements 
per year. 



/'(£>1.05)-l-y^{2 OEi = 0) m^~ 



= l-i'(2 0E.^2)^<;„^^ 

= 1 -P(l OEi <3) y^ <m <.n 

(16) 

The probabilities of observing to 3 exceedances in 
any 1 year given the chosen sampling frequency can 
be estimated using the exponential distribution 
given in Eq. (17). 

P(5 Oii, =0) =Po^ 

i'(XO£, =l) = 3/'o'/'o 

F{% OEi =2) = 3/'o'ft + 3A'/'o 

P{^OE = y)= Pi^ + 3Po^Pi + 6PoPiPi (16) 

A plot of the probability of declaring an area in 
nonattainment as a function of the number of sam- 
ples taken per year is shown in Fig. 3. For c < l.QL, 
classification as nonattainment is a Type I error. 
For ol.OL, probability of proper classiHcation 
represents the power of the approach. The disconti- 
nuities occur because of the change in the integer 
values of the number of expected exceedances that 
occur at different n/m values. It can be seen that for 
an area that is exactly in attainment (c = 1.0i.), 
there is a probability of up to 60% that it will be 
misclassified as nonattainment depending on the 
number of samples taken per year. This form of the 
standard, therefore, has a high probability of a type 
1 error in order to attain a reasonable power to 
identify real nonattainment areas. 




= l-P(lOEi^l) jj^<m^^ 
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Fig. 3. Probability of classifying an area as being in nonattain- 
ment of the 24 h NAAQS for PMio based on an exponential dis- 
triliution model of the tail. 
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The goal of this standard is to have the second 
highest actual value whether measured or not, be at 
or below the prescribed concentration. Thus, alter- 
native approaches that can more accurately esti- 
mate the second highest value in the tail of an 
extreme valued distribution would potentially 
provide equal or greater power white lowering the 
probability of making a misclassification error. Such 
an estimation process would make the standard 
more efficient while maintaining or possibly im- 
proving its effectiveness. 

3.2 Most Exposed Individual 

Under the Clean Air Act Amendment of 1990, 
the Congress has mandated that major emission 
sources of hazardous air pollutants, defined as ma- 
terials on a list of 189 substances given in the Act, 
must install emission control systems. After these 
systems are in place, the residual risk to the most 
exposed individual must be assessed. If the risk is 
found to be > 10"*, the EPA Administrator must 
decide what additional steps, if any, are to be taken 
to reduce this risk. Previously the most exposed in- 
dividual (MEI) has been defined as a person living 
continuously at the fence line of the facility 200 m 
from the emission source for 70 years, The idea of 
a 24 h per day, 70 year lifetime exposure for this in- 
dividual is obviously an overestimate of the real 
maximally exposed individual. Recently EPA has 
revised its guidelines for exposure assessment to 
support the development of a distribution of expo- 
sures that an individual might encounter. However, 
extreme value statistics is never mentioned in any of 
the discussions of the use of the upper tail of the 
distribution to examine exposure and thus risk lo 
the most exposed individual. Since the inaccurate 
estimation of the residual risk could result in sub- 
stantial costs for no health benefit if the maximum 
exposure is overestimated or result in death or ad- 
verse health effects if underestimated, the best 
statistical methodologies should be applied to this 
important estimation problem. This situation ap- 
pears ideally suited for extreme value statistics and 
thus should simultaneously provide interesting 
statistical problems to solve and value to the society 
by solving them properly. 

4. Conclusions 

There appear to be a number of areas in the air 
pollution field in which rigorous application of ex- 
treme value methods could provide useful contribu- 
tions to solving important environmental problems. 



The better estimation of the bounds for aerosol size 
distributions, the determination of attainment or 
nonattainment of the NAAQS for PMio, and expo- 
sure and risk assessments at the high end of the 
range of possible exposures all could benefit from 
substantial involvement of extreme value statistical 
expertise. It is hoped that this report will spark in- 
terest in one or more of these problem areas. 
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